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Abstract 

This  research  evaluated  differences  In  the  psychometric  quality  of  supervisor 
vs.  observer  performance  ratings.  Specifically,  type  of  rater  (supervisor  vs. 
observer)  and  type  of  Instructions  (rating  vs.  neutral  Instructions)  were 
manipulated  In  a  2X2  factorial  design  to  compare  the  traditional  laboratory 
performance  appraisal  approach  with  a  more  realistic  experimental  design. 

Results  Indicated  supervisors  demonstrated  more  halo  and  leniency  error  In 
rating  subordinates'  behavior  than  did  observers.  Type  of  Instructions  given  to 
the  raters  had  no  effect. 
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Enhancing  the  External  Validity  of  Laboratory 
Performance  Appraisal  Studies 

For  a  laboratory  experiment  on  performance  appraisal  to  be  externally 
valid.  It  should  be  as  similar  to  the  actual  work  setting  as  possible.  Feldman 
(1981)  suggests  that  the  laboratory  evaluator  should  be  required  to  perform 
other  tasks  In  addition  to  observing  the  ratee,  since  a  supervisor  In  an  actual 
work  setting  would  not  be  able  to  concentrate  solely  on  the  employee's 
performance.  Similarly,  Banks  and  Murphy  (1985)  have  emphasized  the  Importance 
of  Incorporating  contextual  variables  present  In  organizational  settings  Into 
laboratory  experiments  on  performance  appraisals.  The  traditional  laboratory 

experiment  In  which  the  subject  Is  Informed  that  s/he  will  be  evaluating  a 

% 

stimulus  person,  and  subsequently  views  a  videotape  of  that  person  performing 
various  tasks  (Blgoness,  1976;  Murphy,  Martin  &  Garcia,  1982),  Is  missing  some 
crucial  characteristics  of  the  actual  work  setting.  As  a  result,  the  entire 
cognitive  process  Induced  by  this  experimental  procedure  may  be  different  than 
the  one  which  would  generally  be  employed  by  the  rater  In  an  actual  work 
setting. 

By  Instructing  the  subjects  that  their  sole  task  In  the  experiment  Is  to 
rate  the  performance  of  a  target  person,  the  experimenter  Is  forcing  the  raters 
to  attend  to  a  specific  type  of  stimulus  Input.  The  experimenter  has  Induced  a 
controlled  process  of  attention  In  the  subject.  A  rater  In  an  actual  work 
setting  1$  concerned  with  a  number  of  tasks,  each  of  which  requires  some  degree 
of  attention.  It  Is  therefore  likely  that  any  Information  available  to  the 
rater  In  this  setting  will  have  been  encoded  by  means  of  an  automatic  process  of 
attention  (Schneider  A  Shlffrln,  1977).  The  manner  In  which  Information  Is 
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encoded  will  strongly  Influence  the  aval  lability  of  that  Information  on 
subsequent  judgements  of  the  stimulus  person  (Srull  A  Wyer,  1979).  A  controlled 
process  of  attention  should  result  in  the  specific  behaviors  observed  by  the 
rater  being  available  during  the  subsequent  performance  appraisal.  An  automatic 
process  of  attention  should  result  In  the  Information  being  encoded  into  general 
schemas.  The  performance  ratings  In  these  Instances  will  be  Influenced  by  the 
prototypes  of  the  categories  In  which  the  Information  was  stored,  and  should 
therefore  exhibit  rating  errors  (halo  and  leniency)  based  on  a  global  Impression 
of  the  ratee  (Nathan  4  Lord,  1983). 

The  present  experiment  employed  a  2  X  2  factorial  design  (rating 
Instructions  vs.  neutral  instructions  X  supervisor  vs.  observer)  to  compare  the 
traditional  laboratory  performance  appraisal  approach  with  a  more  realistic,  and 
therefore  externally  valid,  experimental  design.  The  dependent  variables  were 
the  amount  of  halo  and  leniency  errors  committed  on  the  rating  forms.  It  was 
hypothesized  that:  l)  the  supervisors  would  exhibit  more  halo  and  leniency 
errors  in  their  ratings  than  the  observers,  and  2)  the  subjects  receiving  the 
neutral  Instructions  would  exhibit  more  halo  and  leniency  errors  than  the 
subjects  In  the  rating  Instructions  condition. 

Method 

Subjects 

One  hundred  and  fifty  Introductory  psychology  students  volunteered  for  the 
study,  and  received  experimental  credit  for  participating. 

Task 

All  work  groups  performed  a  manufacturing  game  developed  by  Fotl  (1981). 

The  purpose  of  the  game  Is  for  the  group  to  manufacture  as  many  models  In  the 
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fastest  time  possible.  In  order  to  maximize  profits.  The  subjects  start  out 
with  S3 ,000  (play  money),  and  are  given  a  price  list  for  raw  materials  and  the 
finished  products.  The  game  requires  some  strategy  In  terms  of  deciding  what  to 
manufacture  (there  are  Instructions  for  2  different  models),  the  amount  of  raw 
materials  to  buy,  and  how  to  divide  up  the  labor. 

Procedure 

Five  subjects  were  run  in  each  experimental  session.  One  Individual  was 
randomly  assigned  to  view  the  group  on  a  monitor  In  a  separate  room;  and  one  of 
the  remaining  four  subjects  was  randomly  assigned  the  role  of  supervisor  for  the 
group.  Once  the  subjects  were  seated,  they  were  either  told  that  they  would  be 
asked  to  rate  the  performance  of  the  group  members  on  the  task  (rating 
instructions)  and  then  Instructed  to  follow  the  task  directions  as  closely  as 
possible,  or  they  were  merely  Instructed  to  follow  the  task  directions  as 
closely  as  possible  (neutral  instructions).  The  subjects  were  then  given  10 
minutes  to  read  the  instructions,  and  discuss  how  they  would  perform  the  task. 

It  was  the  supervisor's  responsibility  to  assign  tasks,  keep  a  running  tab  of 
monies  spent  on  raw  materials  and  monies  earned,  as  well  as  evaluate  the  quality 
of  each  model  produced.  The  person  viewing  the  monitor  received  the  same 
Instructions.  The  work  group  was  then  given  two  15-minute  sessions  to  construct 
the  models  while  the  observer  in  the  other  room  watched  on  the  monitor.  At  the 
end  of  the  second  session  the  experimenter  administered  the  questionnaire  to  all 
all  five  of  the  subjects. 

Dependent  Variables 

Rating  Scales.  The  measure  of  subordinate  behavior  consisted  of  10 
performance  dimensions  which  were  rated  on  5-polnt  Likert  scales  with  anchors  of 
poor  and  excellent.  The  dimensions  were  obtained  from  a  pilot  study  using  15 


subjects  (three  6-person  groups)  who  performed  the  manufacturl ny  game  and 
subsequently  listed  what  they  felt  were  relevant  performance  dimensions.  The 
dimensions  used  are  as  follows:  behavior  flexibility,  quality  of 
decision-making,  organizing/planning,  delegatory  skills,  communication  skills, 
construction  skills,  Idea  contribution,  cooperation,  quality  of  product  and  an 
overall  evaluation  of  each  subordinate. 

Halo  and  leniency.  Halo  was  operationalized  as  a  subject's  (either 
supervisor  or  observer)  standard  deviation  across  all  nine  performance 
dimensions  for  each  subordinate  (Saal,  Downey  4  Lahey,  1980).  Less  dispersion 
among  the  dimension  ratings,  as  evidenced  by  smaller  standard  deviations, 
indicates  a  greater  halo  effect.  Leniency  was  operationalized  as  simply  the 
average  dimension  rating  for  each  subordinate. 

Data  Analysis  Procedures 

For  the  halo  measure,  a  2  X  2  X  3  (instructions  X  type  of  rater  X 
subordinates)  fixed-factor  ANOVA  with  repeated  measures  on  the  latter  two 
factors  was  performed.  For  the  leniency  measure,  a  2  X  2  (instructions  X  type 
of  rater)  ANOVA  with  repeated  measures  on  the  rater  factor  was  performed.  For 
both  analyses,  groups  were  the  unit  of  analysis.  This  was  necessary  because 
task  groups  were  not  constant  across  each  supervisor/observer  pair. 

Results 

It  was  hypothesized  that  supervisors  would  exhibit  a  stronger  halo  effect 
than  observers.  Results  of  the  2  X  2  X  3  ANOVA  indicated  a  significant  main 
effect  for  type  of  rater,  £  (1,28)  =  4.06,  £  <  .06,  eta  =  .04,  with  means  of 
.56  and  .69  for  supervisors  and  observers  respectively.  Additionally,  It  was 
hypothesized  that  supervisors  would  also  exhibit  more  leniency.  Results  of  the 
2X2  ANOVA  showed  a  significant  main  effect  for  type  of  rater,  £  (1,  28)  = 
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13.77,  £  <  .001,  eta  ■  .12,  with  means  of  3.89  and  3.51  respectively  for 
supervisors  and  observers.  Thus,  both  components  of  hypothesis  1  were 
supported. 

Our  second  hypothesis,  that  subjects  In  the  neutral  Instructions  condition 
would  exhibit  more  halo  and  leniency  error  than  subjects  In  the  rating 
Instructions  condition  was  not  supported.  Results  o*  the  2X2X3  AMOVA  to 
assess  the  effects  of  type  of  Instructions  jn  halo  error  was  nonsignificant,  F_ 
(1,  28)  *  .73,  £  <  .39,  as  well  as  the  results  of  the  2X2  ANOVA  on  leniency 
error,  F_  (1,  28)  -  .84,  £  <  .38. 

Discussion 

The  results  of  the  present  study  suggest  that  there  Is  a  significant 
difference  In  the  psychometric  quality  of  performance  ratings  given  by  active 
supervisors  versus  passive  observers.  We  found  that  supervisory  ratings 
exhibited  more  halo  and  more  leniency  error.  Thus,  this  study  lends  support  to 
the  contention  that  laboratory  evaluators  cannot  be  made  to  focus  exclusively  on 
the  stimulus  person's  behavior  (Banks  i  Murphy,  1985;  Feldman,  1981),  If  the 
results  are  to  be  directly  (rather  than  theoretically)  relevant  to  employment 
settings. 

Our  second  hypothesis,  that  type  of  Instructions  given  to  raters  would 
impact  on  the  psychometric  quality  of  the  ratings  was  not  supported.  It  may  be 
that  given  a  laboratory  situation,  any  type  of  Instructions  will  Induce 
controlled  Information  processing,  and  only  by  making  the  observations  part  of 
some  other  tasks  that  the  person  Is  doing  will  automatic  processing  occur. 

In  conclusion,  although  the  present  research  has  demonstrated  the 
difference  In  halo  and  leniency  errors  between  supervisor  and  observer  ratings, 
the  Issue  of  accuracy  was  not  addressed.  Given  that  there  may  be  a  weak 
positive  relationship  between  certain  rating  errors  (e.g.,  halo)  and  accuracy 
(Cooper,  1981),  future  research  should  focus  on  this  Issue. 
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